Formally verified compilation of low-level C code. (Compilation formellement vérifiée de code C de bas-niveau)

نویسنده

  • Pierre Wilke
چکیده

bytes: memval ::= Byte(b) | Pointer(b, i, n) | Undef Memory chunks: memory_chunk ::= Mint8signed 8-bit integers | Mint8unsigned | Mint16signed 16-bit integers | Mint16unsigned | Mint32 32-bit integers or pointers | Mfloat32 32-bit floats | Mint64 64-bit integers | Mfloat64 64-bit floats alloc m lo hi = (m′, b) Allocates a fresh block with bounds [lo, hi[. free m b = bm′c Frees (invalidates) the block b load κ m b i = bvc Reads consecutive bytes (as determined by κ) at block b, offset i of memory state m. If successful, returns the contents of these bytes as value v. store κ m b i v = bm′c Stores the value v as one or several consecutive bytes (as determined by κ) at offset i of block b. If successful, returns an updated memory state m′. loadbytes m b i n = bmvlc Reads n consecutive bytes from memory statem, starting at location (b, i). If successful, returns a list of memvals. storebytes m b i mvl = bm′c Stores the bytes frommvl in memory statem, starting at location (b, i). If successful, returns an updated memory state m′. size_chunk κ Returns the size (number of bytes) that κ holds. bounds m b Returns the bounds [lo, hi[ of block b. nextblock m Returns the identifier of the next block to be allocated. contents m b Returns the contents of block b as a finite map from offsets to memvals. Figure 2.4: Operations over memory states The memory itself is not a direct mapping from locations to values; instead it is a mapping from locations to abstract bytes called memvals (see Figure 2.4). This allows to reason about byte-level accesses to the memory. A memval is a byte-sized quantity that can be one of the following: Undef represents uninitialised bytes, Byte (b) represents the concrete byte (8-bit integer) b and Pointer (b, i, n) represents the n-th byte of the binary representation of the pointer ptr(b, i). The memory model defines four main memory operations: load, store, free and alloc. The load and store operations are parameterised by a memory chunk κ which concisely describes the number of bytes to be fetched or written, and the signedness of the value. An access at location (b, o) with chunk κ is aligned if size_chunk κ divides o3. For It is slightly too strong a condition: a 64-bit float variable only needs to be accessed at addresses that 36 CHAPTER 2. BACKGROUND instance, the size of the chunk Mint32 is 4 bytes, hence an integer could be accessed with this chunk at offsets that are multiples of 4. These operations are partial, i.e. they may fail e.g. when the access is out of bounds, misaligned, or when the value and the chunk are inconsistent. This is modelled by the option type: we write ∅ for failure and bxc for a successful return of value x. The memory model also defines lower-level memory access operations, namely loadbytes and storebytes, which allow to access the memory at the byte level, i.e. they load and store lists of memvals from and to the memory. The free operation frees a given block. It fails when the given block either has never been allocated or has already been freed. The alloc operation allocates a new block of the requested size. It never fails, thus modelling an infinite memory. The nextblock property of memory states gives the identifier of the next block to be allocated. It usually serves as a threshold for identifying whether blocks are valid (i.e. have been allocated). The contents property gives access to the internal structure of memory states and returns the finite map corresponding to a given block, that associates to each offset a memval. The bounds function returns the bounds of a given block. Those accessors (nextblock,contents and bounds) are not intended to be used in the semantics of the intermediate languages. Rather, using them in formal specifications give strong connections that we will make use of later in this thesis. 2.5.2.3 Pointer Arithmetic A location (b, i) is valid for a memorym (written valid(m, b, i)) if the offset i lies within the bounds of the block b. It is weakly valid (written weakly_valid(m, b, i)) if it is either valid or just one byte past the end of its block. This accounts for a subtlety of the C standard, stating that pointers one-past-the-end of an object deserve a particular treatment, namely that they can be compared to the other pointers to this object. This is intended to make looping over an entire array easier, allowing to compare the current pointer to the pointer just one-past-the-end. Example 2.5.1 (Valid and weakly valid pointers). Consider a block b with bounds [0; 3[. Then, pointers ptr(b, 0) and ptr(b, 3) are valid (and also weakly valid a fortiori). Pointer ptr(b, 4) is not valid, however it is weakly valid. Pointer ptr(b, 5) is neither valid nor weakly valid. Pointer arithmetic is defined in Figure 2.5. The only defined operations on pointers are the addition of an integer offset to a pointer, the subtraction of an integer offset from a pointer, and the subtraction of two pointers that point to the same object. Comparisons are also defined between pointers to the same object. All operations not described are undefined (they return undef). Note that, starting from pointer ptr(b, i) it is not possible to reach a pointer to a different block via pointer arithmetic, as blocks are separated by construction. 2.5.3 Memory Transformations Each of the compilation passes of CompCert is proved correct independently. For most of the passes, this amounts to showing a forward simulation between the source program and the target program, as explained in Section 2.4. Proving a forward simulation requires to exhibit some relation R over program states, and then prove that R is a forward simulation are multiple of 4, not 8.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Expressiveness and Data-Flow Compilation of OpenMP Streaming Programs

We present a dataow extension of OpenMP to express highly dynamic control and data ow over nested, dependent tasks. The language supports dynamic creation, modular composition, variable and unbounded sets of producers/consumers, separate compilation, and rstclass streams. These features, enabled by our original compilation ow, allow translating high-level parallel programming patterns, like dep...

متن کامل

Deriving Imperative Code from Functional Programs Patrice Quinton, Sanjay Rajopadhye, Doran Wilde

Alpha is a data parallel functional language which has the capability of specifying algorithms at a very high level. Our ultimate objective is to generate eecient parallel imperative code from an Alpha program. In this paper, we discuss the related problem of generating eecient single processor imperative code. Analysis techniques that were developed for the synthesis of systolic arrays are ext...

متن کامل

Formally Secure Compilation of Unsafe Low-Level Components

We propose a new formal criterion for secure compilation, providing strong security guarantees for components written in unsafe, low-level languages with C-style undefined behavior. Our criterion goes beyond recent proposals, which protect the trace properties of a single component against an adversarial context, to model dynamic compromise in a system of mutually distrustful components. Each c...

متن کامل

Modelling Program Compilation in the Refinement Calculus

We show how compilation of high-level language programs to assembler code can be formally represented in the refinement calculus. New operators are introduced to widen the modelling language to encompass assembler code. A compilation strategy is then embodied as a set of derived refinement rules.

متن کامل

When Good Components Go Bad: Formally Secure Compilation Despite Dynamic Compromise

We propose a new formal criterion for secure compilation, giving strong end-to-end security guarantees for software components written in unsafe, low-level languages with C-style undefined behavior. Our criterion is the first to model dynamic compromise in a system of mutually distrustful components running with least privilege. Each component is protected from all the others—in particular, fro...

متن کامل

Semantics and Compilation of Sequential Streams into a Static Simd Code for the Declarative Data-parallel Language 81/2 a Denotational Semantics for Recursive Streams and Its Implementation

R esum e 81/2 est un langage data-parall ele de haut-niveau qui s'appuie sur les concepts de stream et de collection. Nous pr esentons dans ce rapport de recherche la s emantique et la compilation des streams 81/2 s equentiels de collections. En premier lieu, nous associons a ces streams une s emantique d enotationnelle et nous d ecrivons le calcul de point-xe correspondant. A la suite de cette...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2016